[https://nvbugs/6225775][fix] Fix spec count graph by chuangz0 · Pull Request #15212 · NVIDIA/TensorRT-LLM

chuangz0 · 2026-06-10T08:41:03Z

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…on (NVIDIA#14537) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

CUDA graph warmup can capture speculative sampling without the generated-token count frequency-penalty path when warmup requests have no frequency penalty. Later RWLT GPT-OSS disagg requests replay that graph with frequency_penalty and prompt_ignore_length, so repeated generated tokens are not penalized. Add speculative logits penalty CUDA ops, preserve sequence-slot count state across CUDA graph metadata/replay, append accepted tokens back into count state, and gate forced graph count capture to the disaggregated generation role by default. Validation: python3 -m py_compile on modified Python modules; git diff --cached --check; GPT-OSS disagg original NVBug config ran 8 total auto-gating runs with >10k=0 and 16K/length=0.

dongfengy and others added 2 commits May 29, 2026 02:39

[https://nvbugs/6168859][fix] move tinygemm PDL release after reducti…

118b6f0

…on (NVIDIA#14537) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

chuangz0 requested review from a team as code owners June 10, 2026 08:41

chuangz0 requested review from byshiue and cascade812 and removed request for a team June 10, 2026 08:41

github-actions Bot assigned chuangz0 Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6225775][fix] Fix spec count graph#15212

[https://nvbugs/6225775][fix] Fix spec count graph#15212
chuangz0 wants to merge 2 commits into
NVIDIA:feat/bench_xfrom
chuangz0:fix/spec-count-graph

chuangz0 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chuangz0 commented Jun 10, 2026

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants